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Abstract — A growing body of research leverages social net- 
work based trust relationships to improve the functionality of 
the system. However, these systems expose users' trust rela- 
tionships, which is considered sensitive information in today's 
society, to an adversary. 

In this work, we make the following contributions. First, we 
propose an algorithm that perturbs the structure of a social 
graph in order to provide link privacy, at the cost of slight 
reduction in the utility of the social graph. Second we define 
general metrics for characterizing the utility and privacy of 
perturbed graphs. Third, we evaluate the utility and privacy of 
our proposed algorithm using real world social graphs. Finally, 
we demonstrate the applicability of our perturbation algorithm 
on a broad range of secure systems, including Sybil defenses 
and secure routing. 

I. Introduction 

In recent years, several proposals have been put forward 
that leverage user's social network trust relationships to 
improve system security and privacy. Social networks have 
been used for Sybil defense ED, Ell, 0, ED, ED, secure 
routing lfl3l . fl9l , fl6l . secure reputation systems fl29l , miti- 
gating spam |T8l . censorship resistance [27 1, and anonymous 
communication |4|, |22|. 

A significant barrier to the deployment of these systems 
is that they do not protect the privacy of user's trusted 
social contacts. Information about user's trust relationships 
is considered sensitive in today's society; in fact, existing 
online social networks such as Facebook, Google+ and 
Linkedin provide explicit mechanisms to limit access to this 
information. A recent study by Dey et al. |6l found that 
more than 52% of Facebook users hide their social contact 
information. 

Most protocols that leverage social networks for system 
security and privacy either explicitly reveal users' trust re- 
lationships to an adversary J5) or allow the adversary to 
easily perform traffic analysis and infer these trust relation- 
ships [32]. Thus the design of these systems is fundamentally 
in conflict with the current online social network paradigm, 
and hinders deployment. 

In this work, we focus on protecting the privacy of users' 
trusted contacts (edge/link privacy, not vertex privacy) while 
still maintaining the utility of higher level systems and 
applications that leverage the social graph. Our key insight 
in this work is that for a large class of security applications 
that leverage social relationships, preserving the exact set 
of edges in the graph is not as important as preserving the 



graph-theoretic structural differences between the honest and 
dishonest users in the system. 

This insight motivates a paradigm of structured graph 
perturbation, in which we introduce noise in the social graph 
(by deleting real edges and introducing fake edges) such that 
the local structures in the original social graph are preserved. 
We believe that for many applications, introducing a high 
level of noise in such a structured fashion does not reduce 
the overall system utility. 

A. Contributions 

In this work, we make the following contributions. 

• First, we propose a mechanism based on random walks 
for perturbing the structure of the social graph that 
provides link privacy at the cost of a slight reduction 



in application utility (Section IV I. 
We define a general metric for characterizing the utility 
of perturbed graphs. Our utility definition considers the 
change in graph sttucture from the perspective of a 
vertex. We formally relate our notion of utility to global 
properties of social graphs, such as mixing times and 
second largest eigenvalue modulus of graphs, and ana- 
lyze the utility properties of our perturbation mechanism 
using real world social networks (Section [V]). 
We define several metrics for characterizing link privacy, 
and consider prior information that an adversary may 
have for de-anonymizing links. We also formalize the 
relationship between utility and privacy of perturbed 
graphs, and analyze the privacy properties of our per- 
turbation mechanism using real world social networks 
(Section |yl| >. 

Finally, we experimentally demonstrate the real world 
applicability of our perturbation mechanism on a broad 
range of secure systems, including Sybil defenses and 



secure routing (Section VII I. In fact, we find that for 
Sybil defenses, our techniques are of interest even 
outside the context of link privacy. 

II. Related Work 

Work in this space can be broadly classified into two 
categories: (a) mechanisms for protecting the privacy of links 
between labeled vertices, and (b) mechanisms for protecting 
node/graph privacy when vertices are unlabeled. The focus of 
this work is on protecting the privacy of relationships among 
labeled vertices. 



A. Link privacy between labeled vertices 

There are two main mechanisms for preserving link pri- 
vacy between labeled vertices. The first approach is to 
perform clustering of vertices and edges, and aggregate them 
into super vertices (e.g., [ 1 1 and 1341 '). In this way, infor- 
mation about corresponding sub-graphs can be anonymized. 
While these clustering approaches permit analysis of some 
macro-level graph properties, they are not suitable for black- 
box application of existing social network based applications, 
such as Sybil defenses. The second class of approaches aim 
to introduce perturbation in the social graph by adding and 
deleting edges and vertices. Next, we discuss this line of 
research in more detail. 

Hay et al. ifTTI propose a perturbation algorithm which 
applies a sequence of k edge deletions followed by k random 
edge insertions. Candidates for edge deletion are sampled 
uniformly at random from the space of existing edges in 
graph G, while candidates for edge insertion are sampled 
uniformly at random from the space of edges not in G. The 
key difference between our perturbation mechanism and that 
of Hay et al. is that we sample edges for insertion based on 
the structure of the original graph (as opposed to random 
selection). We will compare our approach with that of Hay 
et al. in Section [VT] 

Ying and Wu PT1 study the impact of Hay et al.'s 
perturbation algorithms ITT1 on the spectral properties of 
graphs, as well as on link privacy. They also propose a 
new perturbation algorithm that aims to preserve the spectral 
properties of graphs, but do not analyze its privacy properties. 

Korolova et al. [ 12] show that link privacy of the overall 
social network can be breached even if information about 
the local neighborhood of social network nodes is leaked 
(for example, via a look-ahead feature for friend discovery). 

B. Anonymizing the vertices 

Although the techniques described above reveal the iden- 
tity of the vertices in the social graph but add noise to the 
relationships between them, there have been various works 
in the literature that aim at anonymizing the identities of the 
nodes in the social network. This line of research is orthog- 
onal to our goals, but we describe them for completeness. 

The straightforward approach of just removing the identi- 
fiers of the nodes before publishing the social graph does not 
always guarantee privacy, as shown by Backstrom et. al. |f2|. 
To deal with this problem, Liu and Terzi [14] propose a 
systematic framework for identity anonymization on graphs, 
where they introduce the notion of fc-degree anonymity. Their 
goal is to minimally modify the graph by changing the 
degrees of specially-chosen nodes so that the identity of each 
individual involved is protected. An efficient version of their 
algorithm was recently implemented by Lu et al. fl5l . 

Another notion of graph anonymity in social networks is 
presented by Pei and Zhou 11351 : A graph is fc-anonymous if 
for every node there exist at least k — 1 other nodes that share 



isomorphic neighborhoods. This is a stronger definition than 
the one in [14|, where only vertex degrees are considered. 

Zhou and Pei ll36ll recently introduced another notion 
called ^-diversity for social network anonymization. In this 
case, each vertex is associated with some non-sensitive at- 
tributes and some sensitive attributes. Maintaining the privacy 
of the individual in this scenario is based on the adversary not 
being able (with high probability) to re-identify the sensitive 
attribute values of the individual. 

Finally, Narayanan and Shmatikov ||23l show some of the 
weaknesses of the above anonymization techniques,propose 
a generic way for modeling the release of anonymized social 
networks and report on successful de-anonymization attacks 
on popular networks such as Flickr and Twitter. 

C. Differential privacy and social networks 

Sala et al. [26] use differential privacy (a more elaborate 
tool of adding noise) to publish social networks with privacy 
guarantees. Given a social network and a desired level of 
differential privacy guarantee, they extract a detailed structure 
into degree correlation statistics, introduce noise into the 
resulting dataset, and generate a new synthetic social network 
with differential privacy. However, their approach does not 
preserve utility of the social graph from the perspective of 
a vertex (since vertices in their graph are unlabeled), and 
thus cannot be used for many real world applications such 
as Sybil defenses. 

Also, Rastogi et al. 11251 introduce a relaxed notion of 
differential privacy for data with relationships so that more 
expressive queries (e.g., joins) can be supported without 
hurting utility very much. 

D. Link privacy preserving applications 

X-Vine [ 19 1 proposes to perform DHT routing using social 
links, in a manner that preserves the privacy of social links. 
However, the threat model in X-Vine excludes adversaries 
that have prior information about the social graph. Thus in 
real world settings, X-Vine is vulnerable to the Narayanan- 
Shmatikov attack ll23l . Moreover the techniques in X-Vine 
are specific to DHT routing, and cannot be used to design a 
general purpose defense mechanism for social network based 
applications, which is the focus of this work. 

III. Basic Theory 

Before we introduce our perturbation mechanism, we 
present some notation and background on graph theory 
needed to understand the paper. 

Let us denote the social graph as G — (V, E), comprising 
the set of vertices V (wlog assume the vertices have labels 
1, . . . , n), and the set of edges E, where \V\ = n and \E\ = 
m. The focus of this paper is on undirected graphs, where 
the edges are symmetric. Let Aq denote the n x n adjacency 
matrix corresponding to the graph G, namely if € E, 
then Aij = 1, otherwise Ah = 0. 



A random walk on a graph G starting at a vertex v is a 
sequence of vertices comprising a random neighbor V\ of v, 
then a random neighbor V2 of v\ and so on. A random walk 
on a graph can be viewed as a Markov chain. We denote 
the transition probability matrix of the random walk/Markov 
chain as P, given by: 

P . . = [ SiW if (*' J) is an edge in G ' (i) 
1 otherwise . 

where deg(i) denotes the degree of the vertex i. At any 
given iteration t of the random walk, let us denote with 7r(t) 
the probability distribution of the random walk state at that 
iteration (7r(t) is a vector of n entries). The state distribution 
after t iterations is given by 7r(i) = ir(0) ■ P 4 , where 7r(0) 
is the initial state distribution. The probability of a t-hop 
random walk starting from i and ending at j is given by P*j. 

For irreducible and aperiodic graphs (which undirected 
and connected social graphs are), the corresponding Markov 
chain is ergodic, and the state distribution of the random walk 
7r(t) converges to a unique stationary distribution denoted by 
7r. The stationary distribution ir satisfies it = ir ■ P. 

For undirected and connected social graphs, we can see 
that the probability distribution 7r s ; = d ^m satisfies the 
equation ir = n ■ P, and is thus the unique stationary 
distribution of the random walk. 

Let us denote the eigenvalues of A as Ai > A2 > . . . > A„, 
and the eigenvalues of P as V\ > ^2 > • • • > v n . The 
eigenvalues of both A and P are real. We denote the second 
largest eigenvalue modulus (SLEM) of the transition matrix 
P as /1 = max(|j/2|, \v n \)- The eigenvalues of matrices A 
and P are closely related to structural properties of graphs, 
and are considered utility metrics in the literature. 

IV. Structured Perturbation 

A. System Model and Goals 

For the deployment of secure applications that leverage 
user's trust relationships, we envision a scenario where these 
applications bootstrap user's trust relationships using existing 
online social networks such as Facebook or Google+. 

However, most applications that leverage this information 
do not make any attempt to hide it; thus an adversary can 
exploit protocol messages to learn the entire social graph. 

Our vision is that OSNs can support these applications 
while protecting the link privacy of users by introducing noise 
in the social graph. Of course the addition of noise must 
be done in a manner that still preserves application utility. 
Moreover the mechanism for introducing noise should be 
computationally efficient, and must not present undue burden 
to the OSN operator. 

We need a mechanism that takes the social graph G as an 
input, and produces a transformed graph G' — (V,E'), such 
that the vertices in G' remain the same as the original input 
graph G, but the set of edges is perturbed to protect link 



privacy. The constraint on the mechanism is that application 
utility of systems that leverage the perturbed graph should 
be preserved. Conventional metrics of utility include degree 
sequence and graph eigenvalues; we will shortly define a 
general metric for utility of perturbed graphs in the following 
section. There is a tradeoff between privacy of links in the 
social graph and the utility derived out of perturbed graphs. 
As more and more noise is added to the social graph, the link 
privacy increases, but the corresponding utility decreases. 

B. Perturbation Algorithm 

Let t be the parameter that governs how much noise we 
wish to inject in the social graph. We propose that for each 
node u in graph G, we perturb all of it's contacts as follows. 
Suppose that node v is a social contact of node u. Then we 
perform a random walk of length t — 1 starting from node 
v. Let node z denote the terminus point of the random walk. 
Our main idea is that instead of the edge (u, v), we will 
introduce the edge (u, z) in the graph G' . It is possible that 
the random walk terminates at either node u itself, or that 
node z is already a social contact of u in the transformed 
graph G' (due to a previously added edge). To avoid self 
loops and duplicate edges, we perform another random walk 
from vertex v until a suitable terminus vertex is found, or 
we reach a threshold number of tries, denoted by parameter 
M. For undirected graphs, the algorithm described so far 
would double the number of edges in the perturbed graphs: 
for each edge (u, v) in the original graph, an edge would 
be added between a vertex u and the terminus point of the 
random walk from vertex v, as well as between vertex v 
and the terminus point of the random walk from vertex u. 
To preserve the degree distribution, we could add an edge 
between vertex u and vertex z in the transformed graph with 
probability 0.5. However this could lead to low degree nodes 
becoming disconnected from the social graph with non-trivial 
probability. To account for this case, we add the first edge 
corresponding to the vertex u with probability 1, while the 
remaining edges are accepted with a reduced probability to 
preserve the degree distribution. The overall algorithm is 
depicted in Algorithm [TJ The computational complexity of 
our algorithm is 0(m). 

C. Visual depiction of algorithm 

For our evaluation, we consider two real world social 
network topologies (a) Facebook friendship graph from the 
New Orleans regional network i liOl/ : the dataset comprises 
63,392 users that have 816,886 edges amongst them, and (b) 
Facebook interaction graph from the New Orleans regional 
network K3U\l : the dataset comprises 43,953 users that have 
182,384 edges amongst them. Mohaisen et al. J2jj found 
that pre-processing social graphs to exclude low degree 
nodes significantly changes the graph theoretic characteris- 
tics. Therefore, we did not pre-process the datasets in any 
way. 



(a) 



(b) 



(c) 





(d) 



(e) 



Fig. 1. Facebook dataset link topology (a) Original graph (b) Perturbed, t=5 (c) t=10 (d) t= 15, and (e) t=20. The color coding in (a) is derived using a 
modularity based community detection algorithm. For the remaining figures, the color coding of vertices is same as in (a). We can see that short random 
walks preserve the community structure of the social graph, while introducing a significant amount of noise. 



Algorithm 1 Transform (G, t, M): Perturb undirected graph 
G using perturbation t and maximum loop count M, 
G = null; 

foreach vertex u in G 
let count = 1; 

foreach neighbor v of vertex u 
let loop = 1; 
do 

perform t — 1 hop random walk from vertex v, 

let z denote the terminal vertex of the random walk; 

loop + +; 

until (u = z V (u, z) e G') A (loop < M) 
if loop < M 

if count = 1 
add edge (u, z) in G' 

else 

let deg(u) denote degree of u in G; 
add edge (u, z) in G' with probability ^eg^j-i" 1 ; 
count + +; 
return G'; 



Figure [TJ depicts the original Facebook friendship graph, 
and the perturbed graphs generated by our algorithm for 
varying perturbation parameters, using a force directed al- 
gorithm for depicting the graph. The color coding of nodes 



in the figure was obtained by running a modularity based 
community detection algorithm on the original Facebook 
friendship graph, which yielded three communities. For the 
perturbed graphs, we used the same color for the vertices 
as in the original graph. This representation allows us to 
visually see the perturbation in the community structure of 
the social graph. We can see that for small values of the 
perturbation parameter, the community structure (related to 
utility) is strongly preserved, even though the edges between 
vertices are randomized. As the perturbation parameter is 
increased, the graph looses its community structure, and 
eventually begins to resemble a random graph. 

Figure [2] depicts a similar visualization for the Facebook 
interaction graph. In this setting, we found two communities 
using a modularity based community detection algorithm in 
the original graph. We can see a similar trend in the Facebook 
interaction graph as well: for small values of perturbation 
algorithm, the community structure is somewhat preserved, 
even though significant randomization has been introduced in 
the links. In the following sections, we formally quantify the 
utility and privacy properties of our perturbation mechanism. 

V. Utility 

In this section, we develop formal metrics to characterize 
the utility of perturbed graphs, and then analyze the utility 
of our perturbation algorithm. 
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Fig. 2. Facebook dataset interaction topology (a) Original graph (b) Perturbed t=5 (c) t=10 (d) t= 15, and (e) t=20. We can see that short random walks 
preserve the community structure of the social graph, while introducing a significant amount of noise. 



A. Metrics 

One approach to measure utility would be to consider 
global graph theoretic metrics, such as the second largest 
eigenvalue modulus of the graph transition matrix P. How- 
ever, from a user perspective, it may be the case that the 
users' position in the perturbed graph relative to malicious 
users is much worse, even though the global graph properties 
remain the same. This motivates our first definition of the 
utility of a perturbed graph from the perspective of a single 
user. 

Definition 1: The vertex utility of a perturbed graph G' 
for a vertex v, with respect to the original graph G, and an 
application parameter I is defined as the statistical distance 
between the probability distributions induced by I hop ran- 
dom walks starting from vertex v in graphs G and G'. 



VU(v,G,G',l) = Distance{P l v {G),P l v (G')) (2) 

P l v denotes the v'th row of the matrix P\ The parameter 
I is linked to higher level applications that leverage social 
graphs. For example, Sybil defense mechanisms exploit large 
scale community structure of social networks, where the 
application parameter I > 10. For other applications such 
as recommendation systems, it may be more important to 
preserve the local community characteristics, where I could 
be set to a smaller value. 



Random walks are intimately linked to the structure of 
communities and graphs, so it is natural to consider their use 
when defining utility of perturbed graphs. In fact, a lot of 
security applications directly exploit the trust relationships 
in social graphs by performing random walks themselves, 
such as Sybil defenses and anonymous communication. 

There are several ways to define statistical distance be- 
tween probability distributions [3|. In this work, we consider 
the following three notions. The total variation distance 
between two probability distributions is a measure of the 
maximum difference between the probability distributions for 
any individual element. 



Variation Distance(P, Q) = \\P - Q\\ 



tvd 



sup \pi 



(3) 



As we will discuss shortly, the total variation distance is 
closely related to the computation of several graph properties 
such as mixing time and second largest eigenvalue mod- 
ulus. However, the total variation distance only considers 
the maximum difference between probability distributions 
corresponding to an element, and not the differences in prob- 
abilities corresponding to other elements of the distribution. 
This motivates the use of Hellinger distance, which is defined 
as: 



Hellinger Distance (P, Q) = 

1 " 

-^•v^E(^)-vW) (4) 

The Hellinger distance is related to the Euclidean distance 
between the square root vectors of P and Q. Finally, we 
also consider the Jenson-Shannon distance measure, which 
takes an information theoretic approach of averaging the 
Kullback-Leibler divergence between P and Q, and between 
Q and P (since Kullback-Leibler divergence by itself is not 
symmetric). 

Jenson-Shannon Distance(P, Q) = 

^■£ P aog(^) + i.5>io g (^) ( 5) 

Using these notions, we can compute the utility of the 
perturbed graph with respect to an individual vertex (ver- 
tex utility). Note that a lower value of VU(v, G, G", I) 
corresponds to higher utility (we want distance between 
probability distributions over original graph and perturbed 
graph to be low). Using the concept of vertex utility, we can 
define metrics for overall utility of a perturbed graph. 

Definition 2: The overall mean vertex utility of a per- 
turbed graph G' with respect to the original graph G, and an 
application parameter I is defined as the mean utility for all 
vertices in G. Similarly the max vertex utility (worst case) of 
a perturbed graph G' is defined by computing the maximum 
of the utility values over all vertices in G. 

vtt tr r> n Distance{P l v {G),P l v {G')) 

VU mean (G, & , I) = } j r— (6) 

VU max (G,G',l) = maxDistance(Pl(G),Pi(G')) (7) 

The notion of max vertex utility is particularly interest- 
ing, specially in conjunction with the use of total variation 
distance. This is because of its relationship to global graph 
metrics such as mixing times and second largest eigenvalue 
modulus, which we demonstrate next. Our analysis shows the 
generality of our formal definition for utility. 

B. Metrics Analysis 

Towards this end, we first introduce the notion of mixing 
time of a Markov process. The mixing time of a Markov 
process is a measure of the minimum number of steps needed 
to converge to its unique stationary distribution. Formally, the 
mixing time of a graph G is defined as: 

r G (e) = maxmin(i|P^(G) -ir\ < e) (8) 



The following two theorems illustrate the bound on global 
properties of the perturbed graph, using the global properties 
of the original graph, and the utility metric. To improve 
readability of the paper, we defer the proofs of these theorems 
to the Appendix. 

Theorem 1: Let us denote the max (worst case) ver- 
tex utility distance between the perturbed graph G' and 
the original graph G by VU max (G,G' J), computed as 
VU max (G,G',l) = max veV VU(v,G',G,l)). Then we 
have that r G ,(e + VU max (G, G', r G (e)) < r G (e). 

Theorem [T] relates the mixing time of the perturbed graph 
using the mixing time of the original graph, and the max 
vertex utility metric, for application parameter / = T G (e). 

Theorem 2: Let us denote the second largest eigenvalue 
modulus (SLEM) of transition matrix P G of graph G as /x G . 
We can bound the SLEM of a perturbed graph G' using the 
mixing time of the original graph, and the worst case vertex 
utility distance between the graphs as follows: 

i l °Z n + l0 Z L + VU m jG,G>,TQ(e) ) , 

1 7~\ S MG' < 

2r G (e) 

2r G (e) + log( 26+2Vt , m J (GtG , [Tg(6) ) 

Theorem [2] relates the second largest eigenvalue modulus 
of the perturbed graph, using the mixing time of the original 
graph, and the worst case vertex utility metric for application 
parameter I = r G (e). 

These theorems show the generality of our utility defini- 
tions. Mechanisms that provide good utility (have low values 
of VUmax), introduce only a small change in the mixing time 
and SLEM of perturbed graphs. 

C. Algorithm Analysis 

Our above results show the general relationship between 
our utility metrics and global graph properties (which hold for 
any perturbation algorithm). Next, we analyze the properties 
of our proposed perturbation algorithm. 

First, we empirically compute the mean vertex utility of 
the perturbed graphs (VU mean ), for varying perturbation 
parameters and varying application parameters. Figure [3] 
depicts the mean vertex utility for the Facebook interaction 
and friendship graphs using the Jenson-Shannon information 
theoretic distance metric . We can see that as the perturbation 
parameter increases, the distance metric increases. This is not 
surprising, since additional noise will increase the distance 
between probability distributions computed from original and 
perturbed graphs. We can also see that as the application 
parameter I increases, the distance metric decreases. This 
illustrates that our perturbation algorithm is ideally suited for 
security applications that rely on local or global community 
structures, as opposed to applications that require exact 
information about one or two hop neighborhoods. We can 
see a similar trend when using Hellinger distance to compute 
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(a) (b) 

Fig. 3. Jenson-Shannon distance between transient probability distributions for original graph and transformed graph using (a) Facebook interaction 
graph (b) Facebook wall post graph. We can see that as the original graph is perturbed to a larger degree, the distance between original and transformed 
transient distributions increases, decreasing application utility. 
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(a) (b) 

Fig. 4. Bellinger distance between transient probability distributions for original graph and transformed graph using (a) Facebook interaction graph 
(b) Facebook wall post graph. We can see that even with different notions of distance between probability distributions (Hellinger/Jenson-Shannon), the 
distance between original graph and perturbed graph monotonically increases depending on the perturbation degree. 



the distance between probability distributions, as shown in 
Figure |4] 

Theorem 3: The expected degree of each node after the 
perturbation algorithm is the same as in the original graph: 

Vv E V,E(deg(v)') = deg(v), where degiv)' denotes the 
degree of vertex u in G". 

Proof: On an expectation, half the degree of any node 
v is preserved via outgoing random walks from v in the 
perturbation process. To prove the theorem, we need to show 
that for each node is the terminal point of deg(v)/2 random 
walks in the perturbation mechanisms (on average). From the 
time reversibility property of the random walks, we have that 
-PAtTj = Pji^j- Thus for any node i, the incoming probability 
of a random walk starting from node j is Pj i = Pfj ^Jfk > 
i.e., it is proportional to the node degree of i. Thus the 
expected number of random walks terminating at node i in 
the perturbation algorithm is given by ^2 ve y deg^P^ 1 . 
This is equivalent to X^uev Piv t ~ 1 deg(i) = deg(i). Since 
half of these walks will be added to the graph G' on average, 
we have that E(deg(v)') — deg{v). ■ 

Corollary 1: The expected value of the largest 
eigenvalue of the transformed graph is bounded as 

■max 11 max 

From the Perron-Frobenius theorem, we have that the 
largest eigenvalue of the graph is related to the notion of 
average graph degree as follows: 



1 1 1 (I X )) < a' x < 

rnax 

(9) 

Taking expectation on the above equation, and using the 
previous theorem yields the corollary. 

Next, we show experimental results validating our theorem. 
Figure [5] depicts the node degrees of the original graphs, and 
expected node degrees of the perturbed graphs, corresponding 
to all nodes in the Facebook interaction and friendship 
graphs. In this figure, the points in a vertical line for different 
perturbed graphs correspond to the same node index. We 
can see that the degree distributions are nearly identical, 
validating our theoretical results. 

Theorem 4: Using our perturbation algorithm, the mixing 
time of the perturbed graphs is related to the mixing time of 
the original graph as follows: TG t < E{tg'{^)) < tg( 6 )- 

Theorem [4] bounds the mixing time of the perturbed 
graph using the mixing time of the original graph and the 
perturbation parameter t. We defer the proof to the Appendix. 
Finally, we compute the mixing time of the original and 
perturbed graphs using simulations. Figure [6] depicts the total 
variation distance between random walks of length x and 
the stationary distribution, for the original and perturbed 
graphs. Q We can see that as the perturbation parameter 
increases, the total variation distance (and the mixing time) 

1 Variation distance has a slight oscillating behavior at odd and even steps 
of the random walk; this phenomenon is also observed in Figure [8| 
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Fig. 5. Degree distribution of nodes using (a) Facebook interaction graph (b) Facebook wall post graph. We can see that the expected degree of each 
node after the perturbation process remains the same as in the original graph. 
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Fig. 6. Total variation distance as a function of random walk length using (a) Facebook interaction graph (b) Facebook wall post graph. We can see that 
increasing the perturbation parameter of our algorithm reduces the mixing time of the graph. 



decreases. Moreover, for small values of the perturbation 
parameter, the difference from the original topology is small. 
As an aside, it is interesting to note that the variation distance 
for the Facebook friendship graph is orders of magnitude 
smaller that the Facebook interaction graph. This is because 
the Facebook interaction graphs are a lot sparser, resulting in 
slow mixing. 

VI. Privacy 

We now address the question of understanding link privacy 
of our perturbation algorithm. We use several notions for 
quantifying link privacy, which fall into two categories (a) 
quantifying exact probabilities of de-anonymizing a link 
given specific adversarial priors, and (b) quantifying risk of 
de-anonymizing a link without making specific assumptions 
about adversarial priors. We also characterize the relationship 
between utility and privacy of a perturbed graph. 

A. Bayesian formulation for link privacy 

Definition 3: We define the privacy of a link L (or a 
subgraph) in the original graph, as the probability of existence 
of the link (or a subgraph), as computed by the adversary, 
under an optimal attack strategy using its prior information 
R:P(L\G',H) 

Note that low values of link probability P(L\G' , H) cor- 
respond to high privacy. We cast the problem of computing 
the link probability as a Bayesian inference problem. Using 
Bayes theorem, we have that: 



In the above expression, P(L\H) is the prior probability of 
the link. In Bayesian inference, P(G'\H) is a normalization 
constant that is typically difficult to compute, but this is not 
an impediment for the analysis since sampling techniques can 
be used (as long as the numerator of the Bayesian formulation 
is computable upto a constant factor) ifTTl . 0. Our key 
theoretical challenge is to compute P(G'\L, H). 

To compute P(G'\L, H), the adversary has to consider all 
possible graphs Gp, which have the link L, and are consistent 
with background information H. Thus, we have that: 

P{G'\L.H) = ^P(G'\Gp) • P(G P \L,H) (11) 

Gp 

The adversary can compute P(G'\Gp) using the knowl- 
edge of the perturbation algorithm; we assume that the ad- 
versary knows the full details of our perturbation algorithm, 
including the perturbation parameter t. Observe that given 
Gp, edges in G' can be modeled as samples from the 
probability distribution of t hop random walks from vertices 
in Gp. Thus we can compute P(G'\Gp) as follows: 

We can compute P(G'\Gp) using the t hop transition 
probabilities of vertices in Gp. 
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Fig. 7. Cumulative distribution of link probability P(L\G' , H) (x-axis is 
logscale) under worst case prior H = G — L using a synthetic scale free 
topology. Small probabilities offer higher privacy protection. 
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(12) 

In general, the number of possible graphs Gp that have L 
as a link and are consistent with the adversary's background 
information can be very large, and the computation in Equa- 
tion [TT] then becomes intractable. 

For evaluation, we consider a special case of this defini- 
tion: the adversary's prior is the entire original graph without 
the link L (which is the link for which we want to quantify 
privacy). Observe that this is a very powerful adversarial 
prior; we use this prior to shed light on the worst-case link 
privacy using our perturbation algorithm. Under this prior, 
we have that: 



P(L\G',G-L) = 



P(G'\L,G - L) ■ P(L\G - L) 

P(G'\G-L) 
P{G'\G) -P{L\G-L) 

P(G'\G-L) 
P{G'\G) ■ P(L\G - L) 
£,P(G'|G-L + 



(13) 



Using G — L as the adversarial prior constraints the set 
of possible Gp to a polynomial number. However, even 
in this setting, we found that the above computation is 
computationally expensive (> 0(n 3 )) using our real world 
social networks. Thus to get an understanding of link privacy 
in this setting, we generated a 500 node synthetic scale-free 
topology using the preferential attachment methodology of 
Nagaraja 112211 . The parameters of the scale free topology 
was set using the average degree in the Facebook inter- 
action graph. Figure [7] depicts the cumulative distribution 
for link probability (probability of de-anonymizing a link, 
P(L\G',H)) in this setting (worst case prior) for the syn- 
thetic scale-free topology. We can see that there is significant 
variance in the privacy protection received by links in the 
topology: for example, using perturbation parameter t = 2, 



40% of the links have a link probability less than 0.1 (small 
is better), while 30% of the links have have a probability of 
1 (can be fully de-anonymized). Note that this is a worst case 
analysis, since we assume that an attacker knows the entire 
original graph except the link in question. Furthermore, even 
in this setting, as the perturbation parameter t increases, the 
privacy protection received by links substantially improves: 
for example, using t = 5, all of the links have a link 
probability less than 0.01, while 70% of the links have a 
link probability of less than 0.001. Thus we can see that 
even in this worst case analysis, our perturbation mechanism 
offers good privacy protection. 

Comparison with Hay et al ifTTl : previous work proposed 
a perturbation approach where k real edges are deleted and k 
fake edges are introduced at random. Even considering k — 
to/2 (which would substantially hurt utility, for example, by 
introducing large number of edges between Sybils and honest 
users), 50% of the edges in the perturbed graph are real edges 
between users; for these edges, P(L\G') — 0.5. Here we can 
see the benefit of our perturbation mechanism: by sampling 
fake edges based on the structure of the original graph, we 
are able to significantly improve link privacy without hurting 
utility. 



B. Relationship between privacy and utility 

Intuitively, there is a relationship between link privacy and 
the utility of the perturbed graphs. Next, we formally quantify 
this relationship. 

Theorem 5: Let the maximum vertex utility of the graph 
(over all vertices) corresponding to an application parameter 
I be VU max (G,G' ,1). Then for any two pair of vertices A 
and B, we have that P(Lab\G') > f(5), where f(6) denotes 
the prior probability of two vertices being friends given that 
they are both contained in a 5 hop neighborhood, and S is 
computed as 6 = minfc : P% B {G') - VU max (G, G',k)> 0. 

Utility measures the change in graph structure between 
original and perturbed graphs. If this change is small (high 
utility), then an adversary can infer information about the 
original graph given the perturbed graph. For a given level 
of utility, the above theorem demonstrates a lower bound on 
link privacy. 

We defer the proof of the above theorem to the Appendix. 
The above theorem is a general theorem that holds for all 
perturbed graphs. To shed some intuition, we specifically 
analyze the lower bounds on privacy for our perturbation 
algorithm (where parameter t governs utility), using the 
transition probability between A and B in the perturbed graph 
G' as a feature to assign probabilities to nodes A and B of 
being friends in the original graph G. We are interested in 
the quantity P(Lab | Pab(G') > x )' f° r different values of 
k. Using Bayes' theorem, we have that: 
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Fig. 8. Median transition probability between two vertices in transformed graph when (a) two vertices were neighbors (friends) in the original graph and 
(b) two vertices were not neighbors in the original graph, for the Facebook interaction graph. 
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Fig. 9. Complimentary cumulative distribution for P^ B (G') using (a) perturbation parameter t 
perturbation parameter t = 10 for the Facebook interaction graph. 
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2, and (b) perturbation parameter t = 5, and (c) 



P(L AB | P k AB {G>) >x) = 



(C) > x | L AB ) ■ P{L AB ) 



P{P k AB {G') > x) 



(14) 



Also, we have that: 



P{P k AB {G') > x) 



P(Pab(G')> x \ L ab ) ■ P(L AB )+ 
P{P k AB {G')>x\L AB )-P{L AB ) 

(15) 



where L AB denotes the event when vertices A and B don't 
have a link. 

To get some insight, we computed the probability dis- 
tributions P{P k AB {G')\L AB ) and P{P% B (G')\L AB ) using 
simulations on our real world social network topologies. 
Figure [8] depicts the median value of the respective probabil- 
ity distributions, as a function of parameter k, for different 
perturbation parameters t, using the Facebook interaction 
graph. We can see that the median transition probabilities 
are higher for the scenario where two users are originally 
friends (as opposed to the setting where they are not friends). 
We can also see that the difference between median transition 
values in the two scenarios is higher when (a) the perturbation 
parameter t is small, and (b) the parameter K (random walk 
length) is small. This difference is related to the privacy of 
the link - larger the difference, greater the loss in privacy. 
The insight from this figure is that for small perturbation 
parameters, the closer two nodes are to each other in the 
perturbed graph, the higher their chances of being friends in 
the original graph. 
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Fig. 10. Cumulative distribution of link probability P(L\G',H) for 
the Facebook interaction graph. Smaller probabilities offer higher privacy 
protection 



Next, we consider the full distribution of transition prob- 
abilities (as opposed to only the median values discussed 
above). Figure ]9ja) depicts the complimentary cumulative 
distribution (P AB (G') > x) using perturbation parameter 
t = 2 for the Facebook interaction graph. Again, we can see 
that when A and B are friends, they have higher transition 
probability to each other in the perturbed graph, compared 
to the scenario when A and B are not friends. Moreover, as 
the value of k increases, the gap between the distributions 
becomes smaller. Similarly, as the value of t increases, the 
gap between the distributions becomes smaller, as depicted in 
Figures [9jb-c). We can use these simulation results to com- 
pute the probabilities in Equation 14 One way to analyzing 
the lower bound on link privacy would be to choose a uniform 
prior for vertices being friends in the original graph, i.e., 
P{L AB ) = gy. Correspondingly, P(L AB ) = 1 - P{L AB ). 



Figure 10 depicts the cumulative distribution of link 
probability computed using the above methodology, for the 
Facebook interaction graph. We can see the variance in 
privacy protection received by links in the topology. Using 
t = 2, 80% of the links have a link probability of less 
than 0.1. Increasing perturbation parameter t significantly 
improves performance; using t = 3, 95% of the links have 
a link probability less than 0.1, and 90% of the links have 
a link probability less than 0.01. In summary, our analysis 
shows that a given level of utility translates into a lowerbound 
on privacy offered by the perturbation mechanism. 

C. Risk based formulation for link privacy 

The dependence of the Bayesian inference based privacy 
definitions on the prior of the adversary motivates the for- 
mulation of new metrics that are not specific to adversarial 
priors. We first illustrate a preliminary definition of link 
privacy (which is unable to account for links to degree 1 
vertices), and then subsequently improve it. 

Definition 4: We define the structural impact (SI) of a link 
L in graph G with respect to a perturbation mechanism M, 
as the statistical distance between probability distributions 
of the output of the perturbation mechanism (i.e., the set 
of possible perturbed graphs) when using (a) the original 
graph G as an input to the perturbation mechanism, and (b) 
the graph G - L as an input to the perturbation mechanism. 
Let P(G' — M(G)) denote the probability distribution of 
perturbed graphs G' using the perturbation mechanism M 
and input graph G. A link has e Si-privacy if the statistical 
distance \\P(G' = M(G)) — P(G' = M(G — L))\\ < e. 

Intuitively, if the SI of a link is high, then the perturbation 
process leaks more information about that link. On the other 
hand, if the SI of a link is low, then the perturbed graph G' 
leaks less information about the link. 

As before, we consider the total variation distance as our 
distance metric between probability distributions. Observe 
that the links in graphs G' are samples from the probability 
distribution P*(G), for v G V. So we can bound the 
difference in probability distributions of perturbed graphs 
generated from G and G-L, by the worst case total vari- 
ation distance between probability distributions P%(G) and 
P l v {G - L), over all v€V, i.e. VU max (G, G — L, t). 

Note that our preliminary attempt at defining link privacy 
above does not accommodate links where either of its end- 
points have degree 1, since removal of that link disconnects 
the graph. This is illustrated in Figure [TT[ which depicts the 
cumulative distribution of e SI link privacy values. We can 
see that approximately 30% and 20% of links in the Facebook 
interaction graph and friendship graphs respectively do not 
receive any privacy protection under this definition (because 
they are connected to degree 1 vertices). For the remaining 
links, we can see a similar qualitative trend as before: 
increasing the perturbation parameter t significantly improves 
link privacy. To overcome the limitations of this definition, 



we propose an alternate formulation based on the notion of 
link equivalence. 

Definition 5: We define the structural equivalence (SE) 
between a link L' with a link L in graph G with respect to a 
perturbation mechanism M, as the statistical distance between 
probability distributions of the output of the perturbation 
mechanism, when using (a) the original graph G as an input 
to the perturbation mechanism, and (b) the graph G - L + 
L' as the input to the perturbation mechanism. A link L has 
K-anonymous e SE privacy, if there exist at least K links L', 
such that \\P{G' = M{G))-P{G' = M(G-L+L'))\\ < e. 

Observe that this definition of privacy is able to account 
for degree 1 vertices, since they can become connected 
to the graph via the addition of other alternate links L'. 
For our experiments, we limited the number of alternate 
links explored to 1000 links (for computational tractability). 



Figure 12 depicts the cumulative distribution of anonymity 



set sizes for links using e = 0.1 for the Facebook interaction 
and friendship graphs. For t = 2 we see a very similar trend 
as in the previous definition, where a non-trivial fraction of 
links do not receive much privacy. Unlike the previous setting 
however, as we increase the perturbation parameter t, the 
anonymity set size for even these links improves significantly. 
Using t = 10, 50% and 70% of the links in the interaction 
and friendship graphs respectively, achieved the maximum 
tested anonymity set size of 1000 links. 

Connection with differential privacy [8|: there is an inter- 
esting connection between our risk based privacy definitions 
and differential privacy. A differentially private mechanism 
adaptively adds noise to the system to ensure that all user 
records in a database (links in our setting) receive a threshold 
privacy protection. In our mechanism, we are adding a fixed 
amount of noise (governed by the perturbation parameter 
t), and observing the variance in e and anonymity set size 
values. 

VII. Applications 

In this section, we demonstrate the applicability of our 
perturbation mechanism to social network based systems. 

A. Secure Routing 

Several peer-to-peer systems perform routing over so- 
cial graph to improve performance and security, such as 
Sprout Ca,Tribler |24), Whanau El and X-Vine |T9). Next, 
we analyze the impact of our perturbation algorithm on the 
utility of Sprout. 

1) Sprout: Sprout is a routing system that enhances the 
security of conventional DHT routing by leveraging trusted 
social links when available. For example, when routing 
towards a DHT key, if leveraging a social link makes forward 
progress in the DHT namespace, then the social link is used 
for routing. The authors of Sprout considered a linear trust 
decay model, where a users' social contacts are trusted with 
probability / (set to 0.95 in |[T6l ). and the trust in other users 
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TABLE I 

Path reliability using Sprout for a linear trust decay model. 
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decreases as a linear function of the shortest path distance 
between the users (a decrement of 0.05 is used in |16|). The 
decrement is bounded by a value that reflects the probability 
of a random user in the network being trusted (set to 0.6 in 

ESI). 

The reliability of a DHT lookup in sprout is defined as 
the probability of all users in the path being trusted. Table [I] 
depicts the reliability of routing using a single DHT lookup 
in Sprout, for the original and the perturbed topologies. 
We used Chord as the underlying DHT system. For each 
perturbation parameter, our results were averaged over 100 
perturbed topologies. We can see that as the perturbation 
parameter increases, the utility of application decreases. For 
example, using the original Facebook interaction topology, 
the reliability of a single DHT path in sprout is 0.11, which 
drops to 0.10 and 0.096 when using perturbed topologies with 
parameters t = 5 and t — 10 respectively. However, even 
when using t = 10, the performance is better as compared 
with the scenario where social links are not used for routing 
(Chord's baseline performance of 0.075). We can see similar 
results for the Facebook friendship graph as well. 



B. Sybil detection 

In a Sybil attack [7|, a single user or an entity emulates the 
behavior of multiple identities in a system. Many systems are 
built on the assumption that there is a bound on the fraction of 
malicious nodes in the systems. By being able to insert a large 
number of malicious identities, an attacker can compromise 
the security properties of such systems. Sybil attacks are a 
powerful threat against both centralized as well as distributed 
systems, such as reputation systems, consensus and leader 
election protocols, peer-to-peer systems, anonymity systems, 
and recommendation systems. 

A wide body of recent work has proposed to leverage trust 
relationships embedded in social networks for detecting Sybil 
attacks ED, G2, 0, (20), EH). However, in all of these 
mechanisms, an adversary can learn the trust relationships 
in the social network. Next, we show that our perturbation 
algorithm preserves the ability of above mechanisms to detect 
Sybils, while protecting the privacy of the social network trust 
relationships. 

1 ) SybilLimit: We use SybilLimit ll32l as a representative 
protocol for Sybil detection, since it is the most popular and 
well understood mechanism in the literature. SybilLimit is 
a probabilistic defense, and has both false positives (honest 
users misclassified as Sybils) and false negatives (Sybils 
misclassified) as honest users. 

We compared the performance of running SybilLimit on 
original graph, and on the transformed graph, for varying 
perturbation parameters. For each perturbation parameter, we 
averaged the results over 100 perturbed topologies. Figure [L3] 
depicts the percentage of honest users validated by Sybil- 
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Fig. 13. SybilLimit % validated honest nodes as a function of SybilLimit random route length for (a) Facebook interaction graph (b) Facebook wall post 
graph. We can see that for a false positive rate of 1 — 2%, the required random route length for our perturbed topologies is a factor of 2 — 3 smaller as 
compared with the original topology. Random route length is directly proportional to number of Sybil identities that can be inserted in the system. 
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Fig. 14. Attack edges in perturbed topologies as a function of attack edges in the original topology. We can see that there is a marginal increase in the 
number of attack edges in the perturbed topologies. The attack edges are directly proportional to the number of Sybil identities that can be inserted in the 
system. 
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Fig. 15. Number of Sybil identities accepted by Sybillnfer as a function of number of compromised nodes in the original topology. We can see that there 
is a significant decline in the number of Sybil identities using our perturbation algorithms. 



Limit using the original graph and perturbed graphs, as a 
function of the SybilLimit random route length (application 
parameter w). For any value of the SybilLimit random 
route length, the percentage of honest nodes accepted by 
SybilLimit is higher when using perturbed graphs. This is 
because our perturbation algorithms reduce the mixing time 
of the graph. In fact, for a false positive percentage of 
1 — 2% (99-98% accepted nodes), the required length of the 
SybilLimit random routes is a factor of 2 — 3 smaller as 
compared to the original topology. SybilLimit random routes 
are directly proportional to the number of Sybil identities that 
can be inserted in the system. 

This improvement in the number of accepted honest nodes 
(reduction in false positives) comes at the cost of increase in 
the number of attack edges between honest users and the 
attacker. Figure [14] depicts the number of attack edges in 
the perturbed topologies, for varying values of attack edges 



in the original graph. We can see that as expected, there is 
a marginal increase in the number of attack edges in the 
perturbed topologies. 

Remark: The number of Sybil identities that an adversary 
can insert is given by S = <?' • w' . We note that the marginal 
increase in the number of attack edges <?' is offset by the 
reduced length of the SybilLimit random route parameter w 
(for any desired false positive rate), thus achieving compa- 
rable performance with the original social graph. In fact, for 
perturbed topologies, since the required random route length 
in SybilLimit is halved for a false positive rate of 1 — 2%, 
and the increase in the number of attack edges is less than 
a factor of two, the Sybil defense performance has improved 
using our perturbation mechanism. Thus for Sybil defenses, 
our perturbation mechanism is of independent interest, even 
without considering the benefit of link privacy. We further 
validate this conclusion using another state-of-art detection 



mechanism called Sybillnfer. 

2) Sybillnfer: We compared the performance of running 
Sybillnfer on real and perturbed topologies. Figure \15\ depicts 
the optimal number of Sybil identities that an adversary can 
insert before being detected by Sybillnfer, as a function 
of real compromised and colluding users. Again, we can 
see that the performance of perturbed graphs is better than 
using original graphs. This is due to the interplay between 
mixing time of graphs and the number of attack edges in 
the Sybil defense application. Our perturbation mechanism 
significantly reduces the mixing time of the graphs, while 
suffering only a marginal increase in the number of attack 
edges. It is interesting to see that the advantage of using our 
perturbation mechanism is less in Figure |T5jb), as compared 
to Figure |T5ja). This is because the mixing time of the 
Facebook friendship graph is much better (as compared with 
the the mixing time of the Facebook interaction graph). Thus 
we conclude that our perturbation mechanism improves the 
overall Sybil detection performance of existing approaches, 
especially for interaction based topologies that exhibit rela- 
tively poor mixing characteristics. 

VIII. Conclusion and Future Work 

In this work, we proposed a random walk based pertur- 
bation algorithm that anonymizes links in a social graph 
while preserving the graph theoretic properties of the orig- 
inal graph. We provided formal definitions for utility of a 
perturbed graph from the perspective of vertices, related our 
definitions to global graph properties, and empirically ana- 
lyzed the properties of our perturbation algorithm using real 
world social networks. Furthermore, we analyzed the privacy 
of our perturbation mechanism from several perspectives (a) 
a Bayesian viewpoint that takes into consideration specific 
adversarial prior, and (b) a risk based view point that is 
independent of the adversary's prior. We also formalized the 
relationship between utility and privacy of perturbed graphs. 
Finally, we experimentally demonstrated the applicability 
of our techniques on applications such as Sybil defenses 
and secure routing. For Sybil defenses, we found that our 
techniques are of independent interest. 

Our work opens several directions for future research, 
including (a) investigating the applicability of our techniques 
on directed graphs (b) modeling closed form expressions 
for computing link privacy using the Bayesian framework, 
and (c) investigating tighter bounds on e for computing 
link privacy in the risk based framework, and (d) modeling 
temporal dynamics of social networks in quantifying link 
privacy. 

By protecting the privacy of trust relationships, we believe 
that our perturbation mechanism can act as a key enabler for 
real world deployment of secure systems that leverage social 
links. 
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Appendix 

A. Proof of Theorem [7} Relating vertex utility and mixing 
time 

We now sketch the proof of the above theorem. From the 
definition of total variation distance, we can see that: 



P*(G") - Tt\\ tvd < \\P${G') - Pl{G)\\ tvd + ||P*(G) - n\\ tvd 



B. Proof of Theorem |2j Relating vertex utility and SLEM 

We now sketch the proof of the above theorem. It is known 
that for undirected graphs, the second largest eigenvalue 
modulus is related to the mixing time of the graph as 
follows OJ: 



A*Gf , ,lw . logn + log(i) 
W^) l0g{ ^- TG ' {e) - We (20) 

From the above equation, we can bound the SLEM in terms 
of the mixing time as follows: 



logn + log(i) 
1 < ncy < 



2t G /(c) 



(21) 



r G ,(e) ~ 2r G ,(e)+log(i) 

Replacing e with e + VU max (G, G' , r G (e)), we have that: 



logn + log(- 







i _ - .e+VU max (G,G',T G (e))' < 

T G ,(e + VU max (G,G',r G (e)) ~ ~ 

2T G ,{e + VU max (G,G',T G (e)) 

2t g , (e + VU max (G, G', r G (e)) + log( 2e+2Vt/m J (G G , ra(e) ) 

(22) 

Finally, we leverage r G >(e + VU max (G, G' , r G (e)) < 
r G (e) in the above equation, to get: 



, l0 & n + l °Z L + VU ma!A G,G>,To(e)) ) . 

1 — < He < 



2r G (e) 



(16) 

From the definition of mixing time, we have that Vi > 

T G (e): 

||-P*(G')-7r|| tod < ||P„ t (C7')-P„*(G)|| tt , d + e (17) 

Substituting t = r G (e) in the above equation, and taking 
the maximum over all vertices, we have that: 



max||P;«( £ )(G') -n\\ tvd < max | \P^ « (G')~ 

V V 

P; G(€) (G)\\ tvd + e 
max||P; G ( £ )(G') -7r\\ tvd < VU max (G,G',T G (e)) + e 

V 

(18) 

Finally, we have that: 

T G ,(e + VU max (G,G',T G (e)) < r G (e) (19) 



2r G (e) + log( 26+2Vt/m J (G[G , [Tg(e) ) 
C. Proof of Theorem Bounding Mixing time 



(23) 



Observe that the edges in graph G' can be modeled as 
samples from the t hop probability distribution of random 
walks starting from vertices in G. We will prove the lower- 
bound on the mixing time of the perturbed graph G' by 
contradiction: let us suppose that the mixing time of the graph 
G' is k < T °t ■ Then in the original graph G, a user could 
have performed random walks of length k ■ t and achieve a 
variation distance less than e. But k ■ t < r G (e), which is a 
contradiction. Thus, we have that ^ < r G -(e). 

We prove an upper bound on mixing time of the perturbed 
graph using the notion of graph conductance. Let us denote 
the number of edges across the bottleneck cut (say S) of the 
original topology as <?. Observe that the t hop conductance 
between the sets S and S is strictly larger than the corre- 
sponding one hop conductance (since S is the bottleneck 
cut in the original topology). Thus, E(G') > g. Hence the 
expected graph conductance is an increasing function of the 
perturbation parameter t, and thus P(r G '(e)) < r G (e). 



D. Proof of Theorem [5J Relating utility and privacy 

From the definition of maximum vertex utility, we have 
that \P l AB (G) - P l AB {G')\ < VU max (G,G',l). 
Thus, we can bound P AB (G) as follows: 

Pab(G') ~ VU max (G,G\l) < P l AB (G) < 

P l AB (.G') + VU max {G,G',l) (24) 

Thus for any value of k, if P k AB (G') — VU max (G, G',k)> 

0, then we have that the lower bound on the probability 
P AB (G) > 0, which reveals the information that A and B 
are within an k hop neighborhood of each other. Thus the 
maximum information is revealed when the value of k is min- 
imized, while maintaining P AB (G')~VU ma x(G, G' , k) > 0, 

1. e., k = 5. This gives us a lower bound on the probability 
of A and B being friends in the original graph: the prior 
probability that two vertices in a 8 hop neighborhood are 
friends: f(S). Let trig denote the average number of links in 
a 8 hop neighborhood, and let ng denote the average number 
of vertices in a 8 hop neighborhood. In the special case of a 
null prior, we have that f(S) = rug/ yjf). 



